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S ' Abstract 

\o ; 

CD \ The advent of new experimental genomic technologies and the mas- 

sive increase of DNA sequence information is helping researchers bet- 
ter understand how our genes work. Recently, experiments on mRNA 
abundance (gene expression) have revealed that gene expression shows 
a stationary organization described by a power-law distribution (scale- 
O^' free organization) (i.e., gene expression k decays as k~'^), which is 

^ . highly conserved in all the major five kingdoms of life, from Bacteria to 

k><( \ Human. An underlying gene expression dynamics "rich-travel-more" 

^ ' was suggested to recover that evolutional conservation of transcrip- 

tional organization. Here we propose a constructive approach to gene 
expression dynamics with larger scope. Our gene expression construc- 
tion restores the stationary state, predicts the power-law exponent for 
different organisms with natural explanation for small correction at 
high and low expression levels, describes the intermediate state dy- 
namics (time finite) and elucidates the gene expression stability. This 
approach requires only one assumption: Markov property. 



*These authors contributed equally to this work. 



1 Introduction 

DNA encodes tens of thousands genes (around 30.000 genes in a human cell), 
which can be expressed as messenger ribonuclei acid (mRNA) transcripts 
and then translated into protein. Proteins are essential for most biochemical 
processes in the cell and execute almost all cell functions. Therefore, the 
protein study is one of the central issues in our post-genomic era. 

Protein abundance in a cell depends on many factors. One of them is 
whether the respective gene is expressed (transcribed) or not, and how fast 
it is expressed. However, direct measures of protein abundance are diffi- 
cult technically. With the advent of new experimental genomic technologies 
as micrroarray chips, simultaneous expression levels of thousands of genes 
can be measured and monitored under different conditions jU |2] . Though 
the relationship between gene expression and abundance of proteins in the 
cells is not completely determined, it is possible to make precise estimations 
about the presence of proteins from gene expressions measures. Hence, the 
experimental study and theoretical analysis of gene expression organization 
in different organisms 013], and its underlying dynamics are important in 
biology. 

Evolutional gene expression organization has recently been studied by 
measuring the mRNA abundance (gene expression) for tens of thousands of 
genes in parallel (GeneChips arrays) through several related experiments j3]. 
From E. Coli to Homo sapies, the gene distribution p{k) (frequency of genes 
that have an amount of expression k) was revealed as a stationary scale-free 
organization [S] (i.e., k decays as a power-law k~'^ [HIIZI )• This organization 
can be recovered by means of the underlying dynamics "rich-travel-more" 
explained in [Hj after some assumptions are considered. 

Here, in this paper, we propose an innovative construction of gene ex- 
pression dynamics, which spontaneously re-builds the observed stationary 
power-law organization, by making use of only one natural assumption: gene 
expression has "short memory" (Markov property). In addition, our gene 
expression construction predicts the power-law exponent for different organ- 
isms with natural explanation for small correction at high and low expression 
levels, describes the intermediate state dynamics (time finite) and elucidates 
the gene expression stability. 

The paper is organised as follows. In Section 2, we describe the methods 
(subsection 2.1) and present our results (subsection 2.2). In Section 3, we 
summarize our work. 



2 Methods and Result 

2.1 Methods 

2.1.1 Natural construction 

Why Markov property. If we say that the system has a Markov property, 
we mean that the future is governed by the present and does not depend on 
the past. Our model is based on Markov property. This assumption is very 
natural for biology since all biological systems obey the physical laws, which 
manifest Markov property. 

Why probability. On the other hand, it seems very natural to use stochas- 
tic models because cells are so complex systems that we can not precisely 
analyse all the variables. Therefore, it is not sufficient to use only deter- 
ministic models and we must approach to the problem by using stochastic 
processes theory. 

In Fig. ^ we sketch our construction at different level of knowledge. 
Next subsections explain in detail each pyramidal level, from general and 
fundamental concepts to particular and operative solutions (stationary orga- 
nization and dynamics). 

2.1.2 Markov property and Master equation. 

Markov property. Let {Xt, < t < oo} be a stochastic process. For 
(t„ > ■ ■ ■ > to); the conditional probability density function 

P{Xn, tn\Xn-l, tn-l] ■ ] Xq, to) = p{Xt„ = Xn\Xt„_-^ = Xn-l, ■ ■ ■ ', Xfg = Xq) 

is defined as usual manner. It is said that a stochastic process has "Markov 
property" , when the condition 

PyXjiy t^nl-^n— 1) t-n— li ' ' ' i -^0) ''0/ P\Xn, Zn\Xfi—l, t-n— ij \^) 

holds for arbitrary t„ > ■ ■ ■ > to- Roughly speaking, Markov process means 
that the future does not depend on the past, but only on the present time. 
In what follows, we assume that the probability density p{x,t\xo,to) has the 
time translation invariance p{x, t\xo, to) = p{x, t + a\xo, to + a) for arbitrary 
a. 

Our only one assumption is that "gene expression has the Markov prop- 
erty" . More precisely, we assume that the gene expression level is denoted by 



the stochastic process with Markov property Xf. This assumption is quite 
natural, because the gene expression system belongs to nature, the nature 
obeys the physical laws and the physical laws manifest the Markov property 
(see Fig. 1). 

Master equation. For the matter of convenience, we write p{x,t) for 
p{x,t\xo,tQ). Then, if the stochastic process has the Markov property (Eq. 
dH)), the conditional probability density function p(a;, t\xo, to) obeys Kramers- 
Moyal expansion (which is mathematically equivalent to the Master equa- 
tion): 

n=l 

where 

a„(x) = lim - / (y - xYT^y, x)dy. (3) 

Here T^iy, x) is an instantaneous transition probability defined by T^ijj, x) = 
p{y,t + e\x,t) for sufficiently small e. Details of the proof can be found in 
Ref. 0. 

In the context of gene expression level, this equation (J2I) represents the 
dynamics of abundance of mRNA (gene expression) . 

How to use Master equation. We can obtain the dynamics of probabil- 
ity density p{x, t \ xq, to) for any time t, from experimental data of instanta- 
neous transition probability T^{y, x) (e is sufficiently small and fixed), by the 
following procedure: 

(i) Given the experimental data of instantaneous transition probability 
T^{y,x) (e is sufficiently small), we obtain an{x) by using Eq. Q.^ 

(a) By inserting an{x) into Kramers- Moyal expansion Q (Master equa- 
tion) and by solving this PDE (partial differential equation), we can 
obtain p{x,t \ xq, to)- 



^Notice that we do not need the whole data T^{y, x) for any e. The necessary data is 
T^{y, x) at a sufficiently small fixed e. 



In general, it is difficult to solve Kramers-Moyal expansion (Eq.(j2I)) (Mas- 
ter equation) in the above second process (ii). However, for particular cases 
we can solve this equation analytically, which will be explained in next sec- 
tions. 

In the sequel, we use notations ki (i=l,2) for gene expression levels in- 
stead of x,y. For example, T^{k2,ki) denotes the instantaneous transition 
probability of expression level from ki to /c2- 

2.1.3 Initial instantaneous transition data T^{k2,ki) 

Recently, gene expression instantaneous transition probability (ITP) T^{k2, ki) 
for individual genes, from expression level ki to expression level k2 along dif- 
ferent conditions, was obtained experimentally 1^. The following expression 

T,{k2,ki) = , exp -— '- , 4 

A/27r(crA;2) e '- ^'^^ -' 

with /i = —0.07, (7 = 0.25 and e = 0.01, faithfully reproduces the exper- 
imental data of ITP of S.Cerevisiae, shown in [S]. Fig. |21 shows this ITP 
Te{k2,ki). It represents how expression level fci (horizontal axis) changes 
into expression level ^2 (vertical axis) under a small time interval. Colors 
represent gradual changes of values of instantaneous transition probability, 
from red (maximum) to light blue (minimum). 

In what follows, our aim is to compute probability density p{k, t) from 
the input data T^{k2, ki), using the procedure given in the previous section. 

2.1.4 Computation of a„(A;i) 

In this section, we compute an(^i) from the initial data of ITP T^{k2,ki). 
We insert Eq. (jlj) into 

<iki) = \ j{k2 - k,TT,{k2, k,)dk2. (5) 

Then, after some computation, we have 

-^e-^ _(^v/^7^+(^-aV)e + 0(6^/2))". (6) 
-oo VTT e 2 



After we take limit e — *> 0, we finally obtain: 



aiih) ■■ 


= fiki, 




02(^1) - 


= {crh)\ 




aiih) -- 


= (z 


>3) 



Here we remark the following. Although we have fixed the value of e at 
e = 0.01 in Eq. (JH), the above limiting procedure e — > is still valid as 
a sufficiently good approximation. Futhermore, it is remarkable that all Oj 

(i = 3, 4, ■ ■ ■ ) vanish. 

2.1.5 Emergence of Kolmogorov equation and SPDE 

In the last section, we find out that ai{ki) = fiki, 02(^1) = (o"^i)^ and 
Oj(^i) = (i > 3), where /x = —0.07 and a = 0.25. However, in order to 
keep the argument more general, we still consider ai{ki) and 02(^1) arbitrary 
while we assume that Oj = vanish for i > 3. 

Kolmogorov Equation. If ai{ki) = vanish for i > 3, then Kramers- 
Moyal expansion ^ (Master equation) becomes Kolmogorov equation: 

^^ = -^{a,{k)p{k,t)} + l^{a,{k)p{k,t)}, (7) 

where p{k, t) = p{k, t\kQ, to). 

SPDE. In addition, it is known that the Kolmogorov equation is equivalent 
to the following stochastic partial differential equation (SPDE): 



dXt = a{Xt)dt + (5{Xt)dWt, (8) 

where stochastic variable Xt denotes the gene expression level, a{Xt) = 
ai{Xt) denotes the instantaneous transition of the average of the gene ex- 
pression level per unit time, (3{Xt) = yoTC^f) denotes the instantaneous 
transition of variance of the gene expression level per unit time and Wt de- 
notes the Wiener process ^. 



2.1.6 Analysis of our model 

From general analysis of Kolmogorov equation, we return to our original 
situation where ai{ki) = fiki, a2{ki) = {akiY and ai{ki) = vanish for 
i > 3. Then, by imposing the condition a{Xt) = fiXt and l3{Xt) = crXt on 
the SPDE (Eq. ©), we obtain 



dXt = fxXtdt + aXtdWf (9) 

This reduced model is also known as Black-Scholes model in financial engi- 
neering (see Ref. ^HI)- This SPDE directly gives useful information about 
the properties of the model as follows: 

(i) Negative fi. From subsection 12.1. 31 we know that /i is negative. This 
means that the gene expression level has a slightly decreasing tendency, 
which seems natural for biological systems. 

(2) Rich-travel-more. This mechanism, which regenerates the stationary 
power-laws, was introduced by |^ . In our construction, this mechanism 
is not concealed at all, and it is evident just by looking at the second 
term (aXtdWt) in Eq. ©. 

2.1.7 Gene expression dynamical solution 

By using Ito formula, we can solve the SPDE (Eq. Q) and derive the 
dynamics of gene expression (the expression-temporal probability density ) 



where /i = —0.07 and a = 0.25.^ From this equation, we can compute the 
probability of finding that the gene expression level takes the value k at 
arbitrary time t. This equation explains how the gene expression behaves as 
a function of time. This is our main result. 



^Certainly, we can formally obtain this solution (Eq. fTIHl ) by substituting the initial 
instantaneous transition probability Tt{ki,k2) (Eq. Q) ) into p(j/, t + e|x, t) = T^{y,x). 
However, this argument is not correct. 



2.2 Interpretation of results 

2.2.1 Results when time is infinity 

The foUowing results are obtained as asymptotic behavior of the system for 
large time: 

(i) Model shows an almost stationary state : In Eq. fllOj) . if we take limit 
t — > oo, then we re-build the power-law distribution observed experimentally 

p{k) (X k i , (11) 

where p{k) = \im.t^oo p(yk,t\ko,tQ) . Here, we remark that p{k,t\ko,to) essen- 
tially does not depend on both ko and to if we take limit t — > 00. The result 
indicates how the system is organized in a stationary state after long time^. 

Once that scale-free organization has been reached, it becomes stationary. 
However, it is worth noticing that when the system continues evolving (time 
passing), the absolute values of gene expression level still continue decreasing 
or increasing with the time in order to keep the total probability equal to 
one. However, the global scale-free organization is invariant. Due to this 
reason, we also call it almost stationary state. We can see the probability 
distribution in Fig. IHl 

(ii) Model predicts power-law organization : It includes prediction of the 
scale- free exponent 7 for each different organism (or dataset). As we see 
in Eq. (fTT|). the expression of 7 is — ^^^ ~ . By knowing fx and a from 
experimental data, it is straightforward to obtain the respective value of 7. 
In particular, we have obtained 7 = 1.81 for the organism S. Cerevisae in 
good agreement with experimental results in |5j, by using /i = —0.07 and 
a = 0.25. The same procedure can be applied for obtaining 7 values for 
different organisms. 

(Hi) Model predicts small corrections in the scale-free distribution : As we 
can see from Fig. El the results predicted by our construction differ slightly 
from a scale-free distribution. It contains a small curvature, reflecting the 
original Log-Normal distribution. It is worth noticing that this curvature is 
also found at the experimental data from |5j for low expression level (high 



^This result Eq. (|ll|l is different from that of [Sj. It is because they assume fi — Q and 



'^' ' — in j^. In our approach, fj, is obtained from experimental data and we do not 

dp(x,t) n 

assume g^' ' = 



probability) and large expression level (very low probability). Hence, the 
agreement with data from ^ is very good. 

(iv) Model elucidates robustness (stability) : It means that, by following 
our construction, any gene expression level k^ at the initial time to will evolve 
to the same stationary final state, self-organizing like a scale-free distribution. 
Hence, the final state is independent and stable under changes done at the 
initial state. 

It is worth remarking that all these findings are supported by experimen- 
tal data 0. 

2.2.2 Intermediate gene expression states (time finite) 

From Eq. ()10|) , we can compute the probability of finding expression level k 
at arbitrary time t. Namely, this equation explains how the gene expression 
changes over time. We show the evolving dynamics of gene expression in Fig. 
13] (a, b, c) (from left to right): (a) initial time, (b) intermediate time or finite 
fixed time (now the system is evolving), (c) time is very large (final state). 
These results are in agreement with the numerical simulated data shown in 
Supplementary Material of ref. ^. 

3 Conclusions 

We have developed a constructive approach to gene expression dynamics 
based on only one theoretical assumption: Markow property. Our gene ex- 
pression construction restores the stationary state, predicts the power-law ex- 
ponent for different organisms with natural explanation for small correction 
at high and low expression levels, describes the intermediate state dynamics 
(time finite) and elucidates the gene expression stability. Furthermore, our 
model is in agreement with the experimental data shown in 0. 

Many extensive analyses have recently been done for studying the organi- 
zation and topology of biological networks [TTl [Tallin]- However, a theoretical 
and rigorous analysis for explaining the dynamical complexity of these bio- 
logical networks has not been developed. An adequate and self-consistent 
theoretical framework would be illuminating and fruitful for understanding 
the cell dynamics. 

Our construction may play the same role for the gene expression dynamics 
as the Schrodinger equation in Quantum Mechanics or the Conservation of 



Energy and Newton's Law in Classical Mechanics. By following our construc- 
tion, it seems plausible to predict the future behaviour of a gene expression 
dynamical system. This marvelous gene expressions mechanics may provide 
a better understanding of cell dynamics. As a future work, this approach 
may be extended for studying multi-gene correlations dynamics, which hold 
interesting information of gene functionality. 
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Figure 1: Scheme of the fundamental levels of our construction. From Markov 
property (or equivalently Master Equation) to scale-free distribution. 
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Figure 2: Transition probability between two states of gene expression T^{k2, ki). 
We use analytical expression for reproducing qualitatively the same behaviour 
observed experimentally by |3|. Colors represent gradual changes of values of 
transition probability: white (maximum), red (medium), light blue (minimum). 
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Figure 3: Gene distribution p{k) in vertical axis in terms of gene expression level 
k in horizontal axis. Scale is log-log. We plot the stationary state corresponding 
to the Log-Normal distribution. As we see the distribution is almost scale-free, 
with small curvature correction. 
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Figure 4: We show the dynamics of gene expression p{k,t,kQ,to) (vertical axis). 
y (resp. x) axis represents ki (resp. ^2) expression level. At the top, from left to 
right, three different states: initial state (t = 0), intermediate state (intermediate 
t), and final state (t — > 00). At the bottom, the same representation by using 3D 
view. Colors represent gradual changes of values of transition probability: white 
(maximum), red (medium), light blue (minimum). 
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